Skip to content

Conversation

pamelafox
Copy link
Collaborator

Purpose

This pull request introduces a major refactor to the agentic retrieval integration, updating the codebase to use the latest Azure AI Search agentic retrieval API.

The new API can optionally include the reference source data (all the fields from each chunk), so we no longer need explicit hydration.

The new API does not support passing in max subqueries at query time, so I've removed that as a Developer Setting. That can only be customized in the search manager, at agent creation time.

This is the changelog for the package upgrade:
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md

And these are the API specs:
https://github.com/Azure/azure-rest-api-specs/blob/a71c94fb88b21af5c99442fd138b2570fc29622b/specification/search/data-plane/Azure.Search/preview/2025-08-01-preview/searchservice.json#L2701

Agentic retrieval API and data model updates:

  • Replaced legacy agentic retrieval classes and parameters (such as KnowledgeAgentAzureSearchDocReference, KnowledgeAgentIndexParams, and hydration logic) with new types (KnowledgeAgentSearchIndexReference, SearchIndexKnowledgeSourceParams, etc.) and simplified reference handling in approach.py. Removed unused hydration and reranker-related code. [1] [2] [3] [4]
  • Updated agent creation in searchmanager.py to use SearchIndexKnowledgeSource and KnowledgeSourceReference instead of KnowledgeAgentTargetIndex, and now explicitly selects source fields and reference options. [1] [2]

Parameter and code cleanup:

  • Removed the hydrate_references, minimum_reranker_score, and max_docs_for_reranker parameters from constructors and method calls in approach.py, chatreadretrieveread.py, and retrievethenread.py. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Dependency updates:

  • Upgraded the azure-search-documents package to version 11.7.0b1 in both requirements.in and requirements.txt to support the new agent API features. [1] [2]

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[X] Yes - It may, if they are using agentic. Hopefully it won't because I gave the agent a new name (suffix of '-upgrade'), so it won't try to use the old agent with incompatible configuration.
[ ] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[X] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@pamelafox pamelafox changed the title Upgrade to latest version of azure-search-documents and agentic retrieval API [WIP] Upgrade to latest version of azure-search-documents and agentic retrieval API Sep 9, 2025
@pamelafox pamelafox marked this pull request as draft September 9, 2025 19:31
@taylorn-ai
Copy link
Contributor

@pamelafox - where are you at with this? Do you need a hand with anything?

@pamelafox pamelafox changed the title [WIP] Upgrade to latest version of azure-search-documents and agentic retrieval API Upgrade to latest version of azure-search-documents and agentic retrieval API Sep 10, 2025
@pamelafox
Copy link
Collaborator Author

@taylorn-ai Tests just passed, and I just verified this is working with the multimodal feature, so this is ready for review! I'd love if you want to review the code and/or check out the branch to see if it works for you.

@pamelafox pamelafox marked this pull request as ready for review September 10, 2025 05:39
Copilot

This comment was marked as outdated.

@pamelafox pamelafox requested a review from Copilot September 10, 2025 05:52
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request upgrades the Azure Search Documents SDK to version 11.7.0b1 and refactors the agentic retrieval integration to use the latest API. The new API includes reference source data directly, eliminating the need for explicit hydration, and removes runtime customization of max subqueries.

Key changes include:

  • Upgraded azure-search-documents to 11.7.0b1 for latest agentic retrieval API support
  • Replaced legacy agentic retrieval classes with new API types and simplified reference handling
  • Removed max_subqueries parameter and hydration-related code as these are no longer supported

Reviewed Changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
app/backend/requirements.in Updated azure-search-documents version to 11.7.0b1
app/backend/requirements.txt Updated dependencies with new azure-search-documents version
app/backend/approaches/approach.py Replaced legacy agentic retrieval types with new API classes and removed hydration logic
app/backend/approaches/chatreadretrieveread.py Removed hydrate_references parameter and max_docs_for_reranker calculations
app/backend/approaches/retrievethenread.py Removed hydrate_references parameter and max_docs_for_reranker calculations
app/backend/prepdocslib/searchmanager.py Updated agent creation to use SearchIndexKnowledgeSource and new reference types
app/backend/app.py Removed ENABLE_AGENTIC_RETRIEVAL_SOURCE_DATA environment variable usage
app/frontend/src/api/models.ts Removed max_subqueries from ChatAppRequestOverrides type
app/frontend/src/pages/chat/Chat.tsx Removed max subqueries UI setting and state management
app/frontend/src/pages/ask/Ask.tsx Removed max subqueries UI setting and state management
app/frontend/src/components/Settings/Settings.tsx Removed max subqueries input field from developer settings
app/frontend/src/components/AnalysisPanel/AgentPlan.tsx Updated activity record type names and property access
app/frontend/src/locales/*/translation.json Removed max subqueries translations from all language files
infra/main.bicep Removed enableAgenticRetrievalSourceData parameter and updated agent name suffix
infra/main.parameters.json Removed ENABLE_AGENTIC_RETRIEVAL_SOURCE_DATA parameter mapping
docs/*.md Updated documentation to remove compatibility warnings and max subqueries references
evals/*.json Removed max_subqueries from evaluation configuration files
tests/ Updated test mocks and removed hydration-related test cases
Comments suppressed due to low confidence (1)

app/frontend/src/components/AnalysisPanel/AgentPlan.tsx:1

  • The AzureSearchQueryStep type includes a query_time field that doesn't appear to be used anywhere in the component. Consider removing this unused field or documenting why it's included if it's intended for future use.
import React from "react";

@taylorn-ai
Copy link
Contributor

@pamelafox - looks good to me, nice work :)

I did notice however, that many of the translation files are missing some keys. I noticed this only because you removed the maxSubqueryCount from some lang files, but not all, so I ran i18n-check.

file key
src/locales/da/translation.json labels.resultsMergeStrategy
src/locales/da/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/da/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/da/translation.json helpTexts.resultsMergeStrategy
src/locales/da/translation.json helpTexts.llmTextInputs
src/locales/da/translation.json helpTexts.llmImageInputs
src/locales/es/translation.json labels.resultsMergeStrategy
src/locales/es/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/es/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/es/translation.json helpTexts.resultsMergeStrategy
src/locales/es/translation.json helpTexts.llmTextInputs
src/locales/es/translation.json helpTexts.llmImageInputs
src/locales/fr/translation.json labels.resultsMergeStrategy
src/locales/fr/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/fr/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/fr/translation.json helpTexts.resultsMergeStrategy
src/locales/fr/translation.json helpTexts.llmTextInputs
src/locales/fr/translation.json helpTexts.llmImageInputs
src/locales/it/translation.json labels.resultsMergeStrategy
src/locales/it/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/it/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/it/translation.json helpTexts.resultsMergeStrategy
src/locales/it/translation.json helpTexts.llmTextInputs
src/locales/it/translation.json helpTexts.llmImageInputs
src/locales/ja/translation.json helpTexts.llmTextInputs
src/locales/ja/translation.json helpTexts.llmImageInputs
src/locales/nl/translation.json labels.resultsMergeStrategy
src/locales/nl/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/nl/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/nl/translation.json helpTexts.resultsMergeStrategy
src/locales/nl/translation.json helpTexts.llmTextInputs
src/locales/nl/translation.json helpTexts.llmImageInputs
src/locales/ptBR/translation.json labels.resultsMergeStrategy
src/locales/ptBR/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/ptBR/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/ptBR/translation.json helpTexts.resultsMergeStrategy
src/locales/ptBR/translation.json helpTexts.llmTextInputs
src/locales/ptBR/translation.json helpTexts.llmImageInputs
src/locales/tr/translation.json labels.resultsMergeStrategy
src/locales/tr/translation.json labels.resultsMergeStrategyOptions.interleaved
src/locales/tr/translation.json labels.resultsMergeStrategyOptions.descending
src/locales/tr/translation.json helpTexts.resultsMergeStrategy
src/locales/tr/translation.json helpTexts.llmTextInputs
src/locales/tr/translation.json helpTexts.llmImageInputs

@pamelafox
Copy link
Collaborator Author

@taylorn-ai Oo thanks! I did not know about i18n-check, that sounds like a new CI check that we need.
Let me generate those translations and see if I can get some human reviewers to check them.

@taylorn-ai
Copy link
Contributor

The one issue I have with it is that it uses i18next-parser which then uses some outdated dependencies see here, but for dev, not really an issue.

Also, I used i18n-auto-translation to translate, and you can even use Azure AI Translation with it, just thought I would mention that too :)

@pamelafox
Copy link
Collaborator Author

pamelafox commented Sep 10, 2025

@taylorn-ai Yep, you're right, it does have a bunch of dependency warnings. I've added it to the CI using npx so that it doesnt have to go in the package.json at all. I generated the translations with GPT-5 in Copilot, which does a decent job usually, but I'll ping some human i18n reviewers too.

@taylorn-ai
Copy link
Contributor

taylorn-ai commented Sep 10, 2025

Actually, something I just noticed, instead of hard coding the field names, maybe they should be fetched dynamically?

e.g.

client = SearchIndexClient(endpoint=endpoint, credential=DefaultAzureCredential())
index = client.get_index(index_name)
field_names = [f.name for f in index.fields if f.searchable]
...
source_data_select = ",".join(field_names)
...

Or, perhaps better...

from dataclasses import fields
from approaches.approach import Document
skip_fields = {"score", "score", "reranker_score", "search_agent_query"}
search_fields = [f.name for f in fields(Document) if f.name not in skip_fields]

@taylorn-ai
Copy link
Contributor

@pamelafox - sorry for the spam, but I did actually notice an issue, not specifically related to this PR, but it made me remember.

It seems at some point, @search.reranker_score was renamed to @search.rerankerScore. All your tests pass because you use reranker_score as the field name, but I would imagine, in your Document class, its likely not returning anything as it uses reranker_score=document.get("@search.reranker_score"),

Co-authored-by: Gwen Peña-Siguenza <[email protected]>
Co-authored-by: Wassim Chegham <[email protected]>
Co-authored-by: Anthony Shaw <[email protected]>
@pamelafox
Copy link
Collaborator Author

@taylorn-ai Hm, I just printed out the values in approach.py from AI Search (non-agentic), and it shows the score for @search.reranker_score, but a null value from @search.rerankerScore
Where do you see that it got renamed?

@HeidiSteen HeidiSteen merged commit 305ab5b into Azure-Samples:main Sep 10, 2025
29 checks passed
@pamelafox
Copy link
Collaborator Author

@taylorn-ai The PR is merged, but do follow-up on rerankerScore if you still see an issue (here or with new issue)

@taylorn-ai
Copy link
Contributor

The documentation says that the field is called @score.rerankderScore and when you use the search index browser in Azure it returns it this same way:

"@search.score": 36.73281,
"@search.rerankerScore": 2.6607306003570557,

However, after looking a bit further, it seems just the SDK returns it as reranker_score, so you are correct, my bad - just very confusing!

@pamelafox
Copy link
Collaborator Author

@taylorn-ai I asked @mattgotteiner and he says that's due to the Python SDK explicitly snake_casing the API return values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants